AITopics | mel-spectrogram sequence

Collaborating Authors

mel-spectrogram sequence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FastSpeech: Fast, Robust and Controllable Text to Speech

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Neural Information Processing SystemsFeb-15-2026, 03:49:24 GMT

Prominent methods (e.g., Tacotron 2)usuallyfirst generate mel-spectrogram from text, and then synthesize speech from themel-spectrogram using vocoder such as WaveNet. Compared with traditionalconcatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech isusually not robust (i.e., some words are skipped or repeated) and lack of con-trollability (voice speed or prosody control).

artificial intelligence, fastspeech, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

FastSpeech: Fast, Robust and Controllable Text to Speech

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Neural Information Processing SystemsAug-20-2025, 09:32:38 GMT

Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of con-trollability (voice speed or prosody control).

fastspeech, mel-spectrogram sequence, sequence, (10 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > Canada (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FastSpeech: New text-to-speech model improves on speed, accuracy, and controllability - Microsoft Research

#artificialintelligenceDec-12-2019, 20:05:32 GMT

Text to speech (TTS) has attracted a lot of attention recently due to advancements in deep learning. Neural network-based TTS models (such as Tacotron 2, DeepVoice 3 and Transformer TTS) have outperformed conventional concatenative and statistical parametric approaches in terms of speech quality. Neural network-based TTS models usually first generate a mel-scale spectrogram (or mel-spectrogram) autoregressively from text input and then synthesize speech from the mel-spectrogram using a vocoder. A spectrogram is a visual representation of frequencies measured over time.) To address the above problems, researchers from Microsoft and Zhejiang University propose FastSpeech, a novel feed-forward network that generates mel-spectrograms with fast generation speed, robustness, controllability, and high quality.

fastspeech, length regulator, sequence, (14 more...)

#artificialintelligence

Genre: Summary/Review (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback